Skip to content

Conversation

@nrc
Copy link
Contributor

@nrc nrc commented Jul 16, 2024

This is an early draft PR to demonstrate an Arrow change (see below for details).

Which issue does this PR close?

Closes #6327.

What changes are included in this PR?

This PR simply allows Interval types to be passed through to Arrow. It will require an Arrow upgrade to work since it depends on apache/arrow-rs#6071 (currently Arrow is patched for the CLI in this PR, this will only work if you have modified Arrow in a sibling folder to DataFusion, I'll remove this and update the PR when there is an Arrow release).

Are these changes tested?

Tested in Arrow. Tests to come here.

Are there any user-facing changes?

The extract function will now work with intervals.

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 16, 2024
@nrc
Copy link
Contributor Author

nrc commented Aug 1, 2024

apache/arrow-rs#6071 has landed. It just missed the 52.2 release of Arrow, so hopefully will be in 53. Once that is released (scheduled for September), I'll resurrect and rebase this PR and add some tests.

Exact(vec![Utf8, Time64(Nanosecond)]),
Exact(vec![Utf8, Interval(YearMonth)]),
Exact(vec![Utf8, Interval(DayTime)]),
Exact(vec![Utf8, Interval(MonthDayNano)]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc can we add

                    Exact(vec![Utf8, Duration(Second)]),
                    Exact(vec![Utf8, Duration(Millisecond)]),
                    Exact(vec![Utf8, Duration(Microsecond)]),
                    Exact(vec![Utf8, Duration(Nanosecond)]),
                    Exact(vec![Utf8View, Duration(Second)]),
                    Exact(vec![Utf8View, Duration(Millisecond)]),
                    Exact(vec![Utf8View, Duration(Microsecond)]),
                    Exact(vec![Utf8View, Duration(Nanosecond)]),

here, then convert durations to Intervals — presumably MonthDayNano is easiest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are added to this PR, Arrow PR coming soon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc nrc force-pushed the interval-extract branch from 8dd3c48 to fec0c9e Compare August 14, 2024 02:24
@github-actions github-actions bot added functions Changes to functions implementation and removed logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 14, 2024
@nrc nrc force-pushed the interval-extract branch from fec0c9e to 905550d Compare August 23, 2024 03:08
@github-actions github-actions bot added core Core DataFusion crate common Related to common crate labels Aug 23, 2024
@nrc
Copy link
Contributor Author

nrc commented Aug 23, 2024

After some testing, I found a couple of bugs due to the seconds function making assumptions about date_part which don't hold for Interval/Duration. Calling date_part to get the nanos from an Interval or Duration will often return Null due to overflow, which will cause date_part/extract to return null, even if we should have ignored the nanos and just used the whole seconds (e.g, extracting seconds from a 1000000 second duration should return 1000000, not null). Furthermore, extract(second from interval '1 second') would return 2 because the one second was queried as seconds and nanoseconds.

Both those bugs are fixed, but because the formatting of Intervals has changed with the Arrow update, it is difficult to write tests, so those will come after the proper Arrow update.

@nrc
Copy link
Contributor Author

nrc commented Sep 17, 2024

Closed in favour of #12514

@nrc nrc closed this Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract from interval type failed

2 participants